Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zvol: allow rename of in use ZVOL dataset #8371

Merged
merged 1 commit into from
Feb 22, 2019

Conversation

loli10K
Copy link
Contributor

@loli10K loli10K commented Feb 3, 2019

Motivation and Context

Fix #6263

Description

NOTE: this is a tentative fix.

While ZFS allow renaming of in use ZVOLs at the DSL level without issues the ZVOL layer does not correctly update the renamed dataset if the device node is open (zv->zv_open_count > 0): trying to access the stale dataset name, for instance during a zfs receive, will cause the following failure:

VERIFY3(zv->zv_objset->os_dsl_dataset->ds_owner == zv) failed ((null) == ffff8800dbb6fc00)
PANIC at zvol.c:1255:zvol_resume()
Showing stack for process 1390
CPU: 0 PID: 1390 Comm: zfs Tainted: P           O  3.16.0-4-amd64 #1 Debian 3.16.51-3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 0000000000000000 ffffffff8151ea00 ffffffffa0758a80 ffff88028aefba30
 ffffffffa0417219 ffff880037179220 ffffffff00000030 ffff88028aefba40
 ffff88028aefb9e0 2833594649524556 6f5f767a3e2d767a 6f3e2d7465736a62
Call Trace:
 [<0>] ? dump_stack+0x5d/0x78
 [<0>] ? spl_panic+0xc9/0x110 [spl]
 [<0>] ? mutex_lock+0xe/0x2a
 [<0>] ? zfs_refcount_remove_many+0x1ad/0x250 [zfs]
 [<0>] ? rrw_exit+0xc8/0x2e0 [zfs]
 [<0>] ? mutex_lock+0xe/0x2a
 [<0>] ? dmu_objset_from_ds+0x9a/0x250 [zfs]
 [<0>] ? dmu_objset_hold_flags+0x71/0xc0 [zfs]
 [<0>] ? zvol_resume+0x178/0x280 [zfs]
 [<0>] ? zfs_ioc_recv_impl+0x88b/0xf80 [zfs]
 [<0>] ? zfs_refcount_remove_many+0x1ad/0x250 [zfs]
 [<0>] ? zfs_ioc_recv+0x1c2/0x2a0 [zfs]
 [<0>] ? dmu_buf_get_user+0x13/0x20 [zfs]
 [<0>] ? __alloc_pages_nodemask+0x166/0xb50
 [<0>] ? zfsdev_ioctl+0x896/0x9c0 [zfs]
 [<0>] ? handle_mm_fault+0x464/0x1140
 [<0>] ? do_vfs_ioctl+0x2cf/0x4b0
 [<0>] ? __do_page_fault+0x177/0x410
 [<0>] ? SyS_ioctl+0x81/0xa0
 [<0>] ? async_page_fault+0x28/0x30
 [<0>] ? system_call_fast_compare_end+0x10/0x15

How Has This Been Tested?

Tested with reproducer added to the ZFS Test Suite.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@loli10K loli10K added the Status: Work in Progress Not yet ready for general review label Feb 3, 2019
@loli10K loli10K force-pushed the issue-6263 branch 2 times, most recently from 169c1b9 to 865e1cf Compare February 5, 2019 18:48
@codecov
Copy link

codecov bot commented Feb 6, 2019

Codecov Report

Merging #8371 into master will increase coverage by 0.24%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8371      +/-   ##
==========================================
+ Coverage   78.43%   78.67%   +0.24%     
==========================================
  Files         380      380              
  Lines      115901   115898       -3     
==========================================
+ Hits        90909    91187     +278     
+ Misses      24992    24711     -281
Flag Coverage Δ
#kernel 79.06% <ø> (+0.02%) ⬆️
#user 67.59% <ø> (+0.7%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f545b6a...f55db45. Read the comment docs.

@loli10K loli10K added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Feb 6, 2019
@loli10K
Copy link
Contributor Author

loli10K commented Feb 6, 2019

This change fixes the failed assertion and does not seem to be breaking anything (at least on the surface). We need to investigate if/when this change can become an issue (e.g during zfs recv), see #6729.

@loli10K
Copy link
Contributor Author

loli10K commented Feb 7, 2019

I tried renaming a ZVOL during an incremental zfs recv which was being delayed (mdelay(5s) via systemtap) in dmu_recv_end(), so after a successful zvol_suspend() but before the zvol_resume():

https://github.com/zfsonlinux/zfs/blob/9b626c126e78cdc36200b66c7cd1dc6a06cf400d/module/zfs/zfs_ioctl.c#L4645-L4647

This system is running code from this PR:

ctime(gettimeofday_s())   uid() ppid()  pid()     execname()
---------------------------------------------------------------------------------
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs -> zfs_ioc_recv_impl tofs="testpool/zvol-recv" tosnap="snap2" origin=          (null) ...
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs  -> dmu_recv_stream drc={.drc_ds=0xffff880022117000, .drc_drr_begin=0xffff880012023d08, .drc_drrb=0xffff880012023d10, .drc_tofs="testpool/zvol-recv", .drc_tosnap="snap2", ...
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs  <- dmu_recv_stream return=0
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs  -> zvol_suspend name="testpool/zvol-recv"
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs   -> zvol_find_by_name name="testpool/zvol-recv" mode=1
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs    -> zvol_name_hash name="testpool/zvol-recv"
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs    <- zvol_name_hash return=5434585174035895019
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs    -> zvol_find_by_name_hash name="testpool/zvol-recv" hash=5434585174035895019 mode=1
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs    <- zvol_find_by_name_hash return={.zv_name="testpool/zvol-recv", .zv_volsize=10485760, .zv_volblocksize=8192, ...
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs   <- zvol_find_by_name return={.zv_name="testpool/zvol-recv", .zv_volsize=10485760, .zv_volblocksize=8192, ...
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs  <- zvol_suspend return={.zv_name="testpool/zvol-recv", .zv_volsize=10485760, .zv_volblocksize=8192, ...
Thu Feb  7 18:00:57 2019      0  25941   7146            zfs  -> dmu_recv_end drc={.drc_ds=0xffff880022117000, .drc_drr_begin=0xffff880012023d08, .drc_drrb=0xffff880012023d10, .drc_tofs="testpool/zvol-recv", .drc_tosnap="snap2", ... owner=0x0
Thu Feb  7 18:01:16 2019      0      2   6599         z_zvol -> zvol_task_cb param=0xffff88002218bc00
Thu Feb  7 18:01:16 2019      0      2   6599         z_zvol  -> zvol_rename_minors_impl oldname="testpool/zvol-recv" newname="testpool/zvol-recv-new"
Thu Feb  7 18:01:16 2019      0      2   6599         z_zvol   -> zvol_name_hash name="testpool/zvol-recv-new"
Thu Feb  7 18:01:16 2019      0      2   6599         z_zvol   <- zvol_name_hash return=15829768060993764661
Thu Feb  7 18:01:16 2019      0      2   6599         z_zvol  <- zvol_rename_minors_impl 
Thu Feb  7 18:01:16 2019      0      2   6599         z_zvol <- zvol_task_cb 
Thu Feb  7 18:01:16 2019      0      2   6599         z_zvol -> zvol_task_cb param=0xffff880011ba1000
Thu Feb  7 18:01:16 2019      0  25941   7146            zfs  <- dmu_recv_end return=2
Thu Feb  7 18:01:16 2019      0  25941   7146            zfs  -> zvol_resume zv={.zv_name="testpool/zvol-recv-new", .zv_volsize=10485760, ...
Thu Feb  7 18:01:16 2019      0  25941   7146            zfs  <- zvol_resume return=0
Thu Feb  7 18:01:16 2019      0  25941   7146            zfs <- zfs_ioc_recv_impl return=2
Thu Feb  7 18:01:16 2019      0      2   6599         z_zvol <- zvol_task_cb 

zfs recv exits with errno=ENOENT, the ZVOL is correctly renamed to "zvol-recv-new" and the system seems to be fine:

root@linux:/usr/src/zfs# zfs send -i $POOLNAME/zvol@snap1 $POOLNAME/zvol@snap2 | zfs recv $POOLNAME/zvol-recv
cannot open 'testpool/zvol-recv': dataset does not exist
cannot receive incremental stream: dataset does not exist
cannot open 'testpool/zvol-recv': dataset does not exist
root@linux:/usr/src/zfs# zfs list
NAME                     USED  AVAIL     REFER  MOUNTPOINT
testpool                 675K  55.3M       24K  /testpool
testpool/zvol             12K  55.3M       12K  -
testpool/zvol-recv-new    12K  55.3M       12K  -
root@linux:/usr/src/zfs# 

While ZFS allow renaming of in use ZVOLs at the DSL level without issues
the ZVOL layer does not correctly update the renamed dataset if the
device node is open (zv->zv_open_count > 0): trying to access the stale
dataset name, for instance during a zfs receive, will cause the
following failure:

VERIFY3(zv->zv_objset->os_dsl_dataset->ds_owner == zv) failed ((null) == ffff8800dbb6fc00)
PANIC at zvol.c:1255:zvol_resume()
Showing stack for process 1390
CPU: 0 PID: 1390 Comm: zfs Tainted: P           O  3.16.0-4-amd64 openzfs#1 Debian 3.16.51-3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 0000000000000000 ffffffff8151ea00 ffffffffa0758a80 ffff88028aefba30
 ffffffffa0417219 ffff880037179220 ffffffff00000030 ffff88028aefba40
 ffff88028aefb9e0 2833594649524556 6f5f767a3e2d767a 6f3e2d7465736a62
Call Trace:
 [<0>] ? dump_stack+0x5d/0x78
 [<0>] ? spl_panic+0xc9/0x110 [spl]
 [<0>] ? mutex_lock+0xe/0x2a
 [<0>] ? zfs_refcount_remove_many+0x1ad/0x250 [zfs]
 [<0>] ? rrw_exit+0xc8/0x2e0 [zfs]
 [<0>] ? mutex_lock+0xe/0x2a
 [<0>] ? dmu_objset_from_ds+0x9a/0x250 [zfs]
 [<0>] ? dmu_objset_hold_flags+0x71/0xc0 [zfs]
 [<0>] ? zvol_resume+0x178/0x280 [zfs]
 [<0>] ? zfs_ioc_recv_impl+0x88b/0xf80 [zfs]
 [<0>] ? zfs_refcount_remove_many+0x1ad/0x250 [zfs]
 [<0>] ? zfs_ioc_recv+0x1c2/0x2a0 [zfs]
 [<0>] ? dmu_buf_get_user+0x13/0x20 [zfs]
 [<0>] ? __alloc_pages_nodemask+0x166/0xb50
 [<0>] ? zfsdev_ioctl+0x896/0x9c0 [zfs]
 [<0>] ? handle_mm_fault+0x464/0x1140
 [<0>] ? do_vfs_ioctl+0x2cf/0x4b0
 [<0>] ? __do_page_fault+0x177/0x410
 [<0>] ? SyS_ioctl+0x81/0xa0
 [<0>] ? async_page_fault+0x28/0x30
 [<0>] ? system_call_fast_compare_end+0x10/0x15

Signed-off-by: loli10K <[email protected]>
if (zv->zv_open_count > 0) {
mutex_exit(&zv->zv_state_lock);
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment in #4344 (comment) this logic was added to make sure the zv_name wasn't updated while being accessed by zvol_ioctl(). This was a concern because it was accessed outside the zv_state_lock. Since this is no longer the case, this should be fine. This does mean the udev symlink name may change out from underneath us, which is arguably what we want to happen.

Function zvol_rename_minors() does not check the reference count before
modifying the zv_name, which is not good since zvol_ioctl(), for instance,
accesses zv_name without holding zvol_state_lock (but with zv_open_count
being positive).

@behlendorf
Copy link
Contributor

@bprotopopov would you mind commenting on this PR.

Copy link
Contributor

@tcaputi tcaputi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me. It definitely fixes the problem and the added test case is great. It also makes the behavior more consistent with how filesystems work.

There is an issue here however. Although this code helps to ensure that the zvol name matches the dataset name, the root cause here seems to be that the zvol suspending / resuming code relies on this to be true, when there actually are several ways it can get out of sync. At least a few cases where I believe this still might occur are:

  • If the zvol_inhibit_dev tunable is set, this renaming code won't run.
  • zvol_rename_minors_impl() is called from zvol_task_cb() which is run via taskqs in several places. None of these callers seem to really care if the taskq_dispatch() was successful or not.
  • The zvol renaming code all runs asynchronously via taskqs. I don't have a specific example, but I believe there might be race conditions between suspending and the renaming tasks running, especially since zvol_suspend() drops the zv_state_lock before expecting zvol_resume() to be called.

I think we can solve this by changing zvol_suspend() so that it takes an objset instead of a name. I think we can then get the zvol via the dataset's owner. By doing that we remove the dependency on the zvol name matching the dataset name.

I have a feeling this problem might unfortunately be a lot more systemic and a lot more than @loli10K originally wanted to bite off. Maybe we need a separate PR for this, but I think we really need to rethink how the code relies on the zvol name and all of the callers of zvol_find_by_name().

@behlendorf its up to you what we should do with the work that's been done here. I'm willing to approve it if we want to save this work for another PR.

@behlendorf
Copy link
Contributor

@tcaputi I agree there's a larger issue to tackle here regarding lookups by zvol name. Doing this with the objset instead of the name as you suggested is definitely worth exploring as a more complete solution. I'd like to move forward with this PR as is, which resolves a real existing issue, and then we can follow up with a more comprehensive improvement.

Copy link
Contributor

@tcaputi tcaputi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (other than style)

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Feb 21, 2019
@behlendorf behlendorf merged commit c44a3ec into openzfs:master Feb 22, 2019
@bprotopopov
Copy link
Contributor

bprotopopov commented Feb 23, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PANIC at zvol.c:1165:zvol_resume()
4 participants